FRX Multi-Lingual Omnifont Recognition Module

Module name:	FRX
Module identifier:	IG_REC_RM_FRX
Filling methods supported:	IG_REC_FM_OMNIFONT
Filters supported:	all filter elements
Trade-off supported:	none
Knowledge base files:	none
Training supported:	yes

The OMNIFONT_PLUS2W, and OMNIFONT_PLUS3W recognition modules require the presence of this module.

Its associated files are:

baltic.shp	Frx shape pack (code page) file.
cyrillic.shp	Frx shape pack (code page) file.
greek.shp	Frx shape pack (code page) file.
latin1.shp	Frx shape pack (code page) file.
latin2.shp	Frx shape pack (code page) file.
turkish.shp	Frx shape pack (code page) file.
charsettable.chr
asciieng.lng	Frx language dictionary. Used in case of multi-language selection.
czech.lng	Frx language dictionary data file.
danish.lng	Frx language dictionary data file.
dutch.lng	Frx language dictionary data file.
english.lng	Frx language dictionary data file.
finnish.lng	Frx language dictionary data file.
french.lng	Frx language dictionary data file.
german.lng	Frx language dictionary data file.
greek.lng	Frx language dictionary data file.
hungar.lng	Frx language dictionary data file.
italian.lng	Frx language dictionary data file.
norsk.lng	Frx language dictionary data file.
polish.lng	Frx language dictionary data file.
port.lng	Frx language dictionary data file.
russian.lng	Frx language dictionary data file.
spanish.lng	Frx language dictionary data file.
swedish.lng	Frx language dictionary data file.
turkish.lng	Frx language dictionary data file.

Application Areas

This module recognizes machine printed text; i.e., from printed publications, laser or ink-jet printers, and electric typewriters. Output from mechanical typewriters in good condition may also be acceptable. It should also be used for letter or near letter quality (NLQ, LQ) output from dot-matrix printers.

Range of Characters

This module supports the recognition of Latin, Greek, and Cyrillic alphabets with enough accented letters to recognize the 54 languages.

The characters are listed in category and alphanumeric order, together with their Code Page values, in Characters and Code Pages.

Multi-Lingual Language Support

The language support of this module is based on the module's internal code pages, which contain characters from a related group of languages. The internal code pages of this module are American/European (Latin 1, 1252), Baltic (1257), Central-European (Latin 2, 1250), Cyrillic (1251), Greek (1253), and Turkish (1254).

The module supports multi-language selection for recognition, though it may not recognize languages from different language groups properly. It supports only language combinations within the same Code Page. For example, it properly processes the English, German, and Italian language combination, since all these languages belong to the Latin 1 (1252) code page. However, when specifying both the French and Czech languages, for example, OMNIFONT_FRX may fail to properly recognize some accented characters in the Czech alphabet, since these languages are not in the same code page. The following table contains the languages by code pages supported by FRX.

Latin 2 (1250)	Polish, Czech, Hungarian, Romanian, Albanian, Croatian, Wend (Sorbian), Slovak, Slovenian
Cyrillic (1251)	Russian, Ukrainian, Byelorussian, Bulgarian, Macedonian, Serbian
Latin 1 (1252)	English, German, French, Spanish, Italian, Dutch, Swedish, Norwegian, Finnish, Danish, Portuguese, Portuguese (Brazilian), Catalan, Afrikaans, Aymara, Basque, Breton, Faroese, Friulian, Gaelic, Galician, Eskimo, Icelandic, Indonesian, Latin, Malaysian, Pidgin English, Swahili, Tahitian, Welsh, Frisian, Zulu
Greek (1253)	Greek
Turkish (1254)	Turkish, Kurdish (written in Latin alphabet)
Baltic (1257)	Estonian, Hawaiian, Latvian, Lithuanian

Character Attributes

The omnifont recognition module can detect and transmit character attributes: bold, italic, or underlined text (or any combination of them). It can also detect and transmit character size, and can classify font types into three broad categories: serif, sans serif, and monospaced.